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ABSTRACT 

We extend the ILC method in harmonic space to include the error in its CMB es- 
timate. This allows parameter estimation routines to take into account the effect of 
the foregrounds as well as the errors in their subtraction in conjunction with the ILC 
method. Our method requires the use of a model of the foregrounds which we do not 
develop here. The reduction of the foreground level makes this method less sensitive 
to unaccounted for errors in the foreground model. Simulations are used to validate 
the calculations and approximations used in generating this likelihood function. 
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1 INTRODUCTION 

The Cosmic Microwave Background (CMB) was emitted 
when our universe was approximately 380,000 years old, and 
provides a window into the physics of our early universe. 
In particular, most of the information about early physics 
is encoded in the angular power spectrum of the tempera- 
ture anisotropies (e.g. Hu fc Dodelson|20 C)2), for the reason 
that all of the information of a statistically isotropic Gaus- 
sian random field is encoded in its angular power spectrum, 
and the deviations of the CMB anisotropies from this are at 
most small, as estimates of the primordial non-Gaussianity 
of the CMB are consistent with zero as measured by WMAP 
(Komatsu et al.|2009| |2011b. While the CMB we observe is 



known to not be exactly Gaussian due to the gravitational 
lensing of the interveni ng matter |Z aldarriaga & Sclja k[T999| 
lOkamoto & Hu||2003| iLewis & Challinor||2006|), the power 



spectrum remains a powerful tool for extracting information 
about early physics (Lewis 2005). One difficulty is that there 
are a number of sources between us and the CMB that emit 
radiation in the same frequency range. A commonly- used 
method for subtracting these foregrounds from the CMB is 
known as Internal Linear Combination (ILC). This method 



was first described in Tegmark (19981. It is internal in the 



sense that it only depends upon data related to the exper- 
iment at hand. And it is linear in that the estimate of the 
CMB is computed as a linear combination of the observed 
temperature maps, subject to the constraint that the sum 
of the linear weights is equal to one. This constraint ensures 
that with the maps calibrated in thermodynamic units with 
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the CMB emission being the same in all frequencies, the re- 
sultant estimate of the CMB has unit response to the CMB. 



In Bennett et al. (20031, a nonlinear fitting routine was 



used to compute the set of linear weights which minimized 
the variance of the output map. Later, Eriksen et al. (20041 
showed that the ILC result could be computed analytically 
by the use of a Lagrange multiplier, as shown previously 



Tegmark (19981. The WMAP team later extended their 



ILC method in Hinshaw et al. ( 2007 1 to include an attempt 



to remove the bias from the ILC method through a series 
of Monte Carlo simulations. They detected only a small 
bias in these simulations. However, we argue in Appendix 
B] that this method may severely underestimate the ILC 
bias. They make no mention of incorporating any error in 
their ILC estimate, and as a result avoid using the ILC map 
for most of their cosmological studies, instead opting for 
a template fitting method whose noise properties are more 
easily-understood. They do make use of their ILC map for 
their low-^ likelihood (jHinshaw et al.||2009| |Larson et al.| 



2011 1, but because they perform the ILC in pixel space, and 



because they do not compute the ILC weights in the same 
region of the sky where they apply them, they are unlikely 
to suffer problems related to the ILC bias. Furthermore, as 
they are strongly dominated by cosmic variance errors for 
their low-£ analysis, their lack of detailed consideration of 
the noise properties of their ILC map is unlikely to under- 
mine their analysis. 

Extending the pixel-based analysis performed in |Ben 



ton 



nett et al. (20031, Tegmark, de Oliveira-Costa & Hamil- 



( 2003 1 estimated the weights independently on subdivi- 



sions in both pixel space and in harmonic space to produce 
a high-resolution CMB map. For validation, they demon- 
strate that their results are similar to the WMAP team's 
Bennett et al.|(|2003[). A similar analysis was performed 



Delabrouille et al. (20091; Basak & Delabrouille (20121 
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where the subdivision of the sky was performed on needlets, 
which are localized in both pixel space and harmonic space. 
They included a calculation of the ILC bias that results from 
the fact that in minimizing the total variance of the output 
CMB estimate, the ILC method can exploit chance corre- 
lations between the CMB and the foregrounds to partially 
cancel the CMB. However, this calculation assumes that the 
needlet coefficients are independent, which is not exactly the 
case. Validation of their results were performed using Planck 
Sky Model simulations. 

If we are working with the full-sky, however, the har- 
monic coefficients are independent. Taking advantage of this 
fact, a harmonic-space analysis was examined in |Saha et al.| 
(20081 who compute a similar bias, while using simulations 



1 — n 

(C t ) = (C t ) + =^(Ct) + 



etN^e 



(1) 



to estimate the bias and the error in the bias. Meanwhile, 
Kim, Naselsky fc Christensen| ( |2008[ ) also investigated ILC 
in harmonic space, and made use of an iterative method in 
an attempt to reduce the foreground bias. However, this it- 
erative method is likely to be susceptible to the same pitfall 
that we describe in Appendix [B] because their estimate of 
the foregrounds is simply the data minus their ILC estimate 

of the CMB. 

Moving beyond temperature analysis, Amblar d, Cooray] 
& Kaplinghat ( 2007 ) made use of the ILC technique to sub- 



tract the foregrounds in harmonic space for use in measur- 
ing the E and B-mode polarization of the CMB, while Efs- 



tathiou, Gratton fc Paci| (j2009) argued that the bias induced 
by the ILC method was too large to allow for estimation 
of the CMB B-mode signal, instead opting for a template- 
based parametric foreground removal method. We do not 
consider polarization here, as though all of our calculations 
would remain unchanged for the E and _B-mode power spec- 
trum estimates, the effect of ILC on the TE cross spectrum 
is far more difficult, and polarization has a much larger fore- 
ground signal compared to the CMB than is the case with 
temperature data, suggesting a dedicated analysis would be 
preferred. 

The previous methods have all made use of simulations 
for validation, with the furthest any has gone to compute the 
errors being to estimate the variance of a series of simulation 
results. Ideally, we would like to have a way to compute the 
errors in the ILC subtraction method, or at least a method 
for propagating the errors in a foreground model. To this 
end, we describe the derivation of our approximation to the 
true likelihood that a given theoretical model matches the 
harmonic-space ILC output in Section [2] This is followed by 
our validation procedures in Section [3] and a discussion of 
what is required to make use of this likelihood function in a 
realistic experiment in Section [4] 



2 DERIVATION OF THE ILC POWER 
SPECTRUM LIKELIHOOD 

In this paper, we follow the same harmonic-space Inter- 



nal Linear Combination (ILC) analysis used in Saha et al 



( 2008 1 . Performing the analysis in harmonic space allows 



us to take advantage of the independence of the harmonic 
coefficients of the CMB to simplify the calculation. Their 
primary result was an estimate of the bias in the power 
spectrum of the estimated CMB map induced by the ILC 
method, which we reproduce here: 



Here {Ct) is the expectation value of the power spectrum of 
the ILC-estimated CMB map, Ce is the true power spectrum 
of the CMB, n c is the number of channels, e is a vector 
of all ones, N^ is the channel-channel covariance matrix of 
the foregrounds plus noise at each I, a nd ' is the conjugate 
transpose. Following 



Saha et al. 



( 2008 1, Nf is assumed to be 



fixed and known, with the expectation value only over CMB 
realizations. We discuss how to relax this assumption later 
in Section [4] 

We can understand the three main terms of this equa- 
tions as follows. The first term is simply the power spectrum 
of the CMB, which comes out with a factor of unity because 
of the assumption of the frequency scaling of the CMB sig- 
nal combined with the assumption of perfect calibration. 
As shown in Dick, Remazeilles & Delabrouille (20101, cal- 



ibration error that is significant compared to the ratio of 
noise to signal can result in pathological behavior of the 
ILC method. We assume in this paper that the data is of 
sufficient quality that such pathological behavior is avoided, 
though it may be worth investigating later whether or not 
this is a valid assumption, and whether our likelihood anal- 
ysis can be modified to include this possibility. 

The second term in the equation is the bias due to 
chance correlations between the CMB and the foregrounds. 
It is simply proportional to the power spectrum of the CMB 
because the Gaussianity of the CMB allows us to ignore 
many of the properties of the foregrounds entirely, such as 
their spatial correlations. The only assumption here is a full- 
rank foreground covariance (this assumption breaks at very 
low I, if 2£ + 1 < n c , but can be recovered by binning the 
power spectrum). 

The final term is a contamination term: it is the power 
spectrum of the ILC method performed on the foreground 
plus noise signal alone. The magnitude of this term de- 
pends upon the frequency coverage of the instrument: higher 
frequency coverage allows for better removal of the fore- 
grounds, reducing the magnitude of this contamination. 

In the following subsection, we extend this calculation 
to an estimate of the variance on the power spectrum, which 
allows us to develop an approximation to the full probability 
distribution. 



2.1 Variance of the estimated power spectrum 

The variance of the ILC estimate of the CMB can be com- 
puted using the same general method as used in comput- 
ing the bias. However, we differ substantially in the specific 
calculation from that described in Saha et al. (20081. In 



particular, they make the assumption that the foreground 
covariance is not full rank. Because of the instrument noise 
ensuring that this covariance is indeed full rank, we do not 
make this assumption. So first we describe the much shorter 
calculation of the ILC bias, then present an outline of the 
calculation for the variance of the ILC estimate. A fuller 
description of the calculation can be found in Appendix [A"] 
First, we start with a few definitions. As the calcula- 
tions are performed in harmonic space, ae m is the spherical 
harmonic transform of the true CMB, and we define Jg m as 
the spherical harmonic transform of everything except the 
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CMB, which includes foregrounds and instrument noise. As 
the foreground plus noise signal in the sky varies from fre- 
quency to frequency, this latter term is a vector for each (., 
m pair. The spherical harmonic transform of the observed 
sky, then, can be written as: 



ea.f m + it, 



(2) 



Here di m is a vector representing the spherical harmonic 
transforms of the observations at each frequency. The ILC 
method selects a series of linear weights w for each channel 
which minimizes the variance of the output subject to the 
constraint that w'e = 1. This constraint forces the contri- 
bution of the CMB in the final result to be unity under the 
assumption that the maps are accurately calibrated in ther- 
modynamic units (where the CMB anisotropies are identical 
in each channel). Minimizing the variance of the estimated 
CMB aim results in: 



"In 



t ^c," 1 

W 1 = jt-1 . 

etC7 e 



(3) 

(4) 



Here Ce is the empirical channel-channel covariance of the 
observations at each £, defined as: 



21+1 ^ 



* k df m dl 



1 '™ a (m' 



(•») 



The fact that di m is the spherical harmonic transform of a 
set of real-valued maps on the sphere ensures the matrix Ce 
only has real-valued elements. This becomes in the compu- 
tation of the variance. 

With these definitions, it is easy to show that the power 
spectrum of the recovered map takes on the simple form: 



a 



etC^e 



(6) 



From here we can compute how the resultant power 
spectrum depends upon the true CMB and the foregrounds 
plus noise. This is done by substituting Equation S into 
the expression for the covariance, giving: 



C, = 



(a lm a* lm ee + ae m ef tm + f lm e a|„ 



•wL)-(7) 



The first term is simply the power spectrum of the true 
CMB signal. The last term is the covariance of the fore- 
grounds plus noise. The two terms in the middle are due to 
the chance correlation between the CMB and foregrounds 
plus noise. We can simplify this equation using notation sim- 



ilar to that in Saha et al. ( 2008[l by defining 






LlmO-im- 



(8) 



As before, the fact that both f and a are spherical 
harmonic transforms of real-valued maps ensures that the 
chance correlation between the CMB and foregrounds plus 
noise fe is also real- valued. Equation (|7| can then be written 
as: 



C e = Ci ee f + eft + f^e 1 " + N* 



(9) 



Here we have introduced Nf to represent the covariance of 
the foregrounds plus noise. If we combine this equation with 
the identity (A + be 1 ") -1 = A -1 - A" 1 A _1 bc t A _1 , where 
A = 1 + c t A _1 b, we can obtain an analytical form for the 
power spectrum of the ILC estimate of the CMB. 

The use of this identity requires that both Ci and N^ be 
invertible, which also implies that A/0. This is the case for 
instrument noise alone, and thus is the case for foregrounds 
plus instrument noise. After applying this identity to the 
inverse of the covariance of the observations three times and 
substituting the result into Equation Q, we obtain: 

Ce = Ce 



etN^e 

-fJN^fV. 



(10) 



Making use of the independence of the aim values, with 
(o,im- L 0'i m2 ) = S mim2 Ce, it is easy to show that the expecta- 
tion value of the above equation is Equation (TlJ> . 



2.2 Variance of the bias 

While we leave a fuller explanation of the computation of 
the variance in the ILC bias to Appendix [A] here we sketch 
an outline of the general process. As a first step in the cal- 
culation, we subtract the expectation value (Ce) from the 
estimate Ce: 

Ce - {Ce) = 

C e - f|N7 1 f, 



+ - 



1 + e^Nj% + f/N^e + e^J%{JNj L e 



etN^e 



■w-iSw-sk- 



("I 



This allows us to group Ce with its expectation value, and 
cancel 1/e^Nj" e, leaving us with an equation with no more 



terms than in Equation ( 10 1. This gives us 



5Ce 



sc, 



f T lNT 
l e L ^i 



1 



l (Ce) 



+ - 



21+1 
etN^e 



(12) 



The next step involves squaring Equation ( 12 l and tak- 
ing the expectation value, the full calculation of which is rel- 
egated to Appendix [A] Here we simply note that the square 
of Equation ( |12[ ) can be considered in two groups: parts that 
have even powers of the aim values, and parts that have odd 
powers of the aim values. When we take the expectation 
value, only those components that have even powers in the 
aim values survive, so we can immediately throw out a large 
number of the cross-terms in the square, making this rather 
cumbersome equation more manageable. The end result is 
the following: 



(6Ct) = ^A((C e ) + ^(Ce) 



2^+1 



2^+1 



etN: 



(13) 



Equation ([13]) is the main result of our paper and can be 
broken down as follows. The first term is simply the standard 
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cosmic variance term. The second term is the modification 
of the cosmic variance term that stems from the existence 
of the ILC bias. There is no term related to the contamina- 
tion of the ILC result by the foregrounds plus noise, because 
that term is assumed to be fixed. The final term proportional 
to 2/e t N^ 1 e instead comes from the square of the correla- 
tion term (e^N^ff + fJN^T 1 e)/e t N^ 1 e, indicating that this 
represents the variance of the zero-mean component of the 
chance correlations between the CMB and foregrounds. 

It should be noted at this point that the variation in the 
noise, as well as our uncertainties in the foreground model 
itself, have not yet been taken into account. Instead, we are 
providing a likelihood function which is a function of the the- 
oretical power spectrum and a specific realization of the fore- 
grounds and noise. The uncertainty in the foreground plus 
noise signal must be taken into account separately. Mini- 
mal modifications of existing parameter estimation pipelines 
may be ideal for taking into account the fact that the fore- 
grounds plus noise are not perfectly-known. We discuss this 
in more detail in Section [4] 

For now, however, we produce a modification of Equa- 



mean and variance are: 



tion ( 13 1 making the assumption that the foregrounds plus 



noise are Gaussian-distributed. This results in the addition 
of a cosmic variance-like term, but making use of the fore- 
ground power spectrum instead: 



(5Cf) 



2(C) 
2^ + 1 



(Ct) + 



1 



Tic 



+ - 



2£+l 

2 



(Ct) + 



etisn 



2^+1 VetNr'e 



(14) 



While this approximation is useful for obtaining a rough es- 
timate of the total error in the extracted CMB, we would not 
recommend using it for a serious analysis without detailed 
verification due to its ad-hoc nature. 



2.3 Approximating the full probability 
distribution 

The previous analysis indicates that we can estimate the full 
probability distribution of the power spectrum that results 
from the ILC method given only a model of the true power 
spectrum and a model of the covariance of the foregrounds, 
which would allow the use of these calculations in CMB 
parameter estimation. To do this, we go back and examine 
each term in the mean and variance of the ILC-estimated 
CMB sky. 

In particular, when we square 5Ci and take the expecta- 
tion value, we are left with terms which stem from the fourth 
power of the spherical harmonic transform coefficients (ae m ) 
and terms which stem from the second power of the spheri- 
cal harmonic transform coefficients. The fourth power terms 



become the first two terms in Equation ( 13 1, and the second 
power terms become the third. 

We can surmise that the fourth power terms collectively 
act as a modified \ 2 distribution in Ce, while the second 
power terms act as a Gaussian, and we can estimate the 
parameters of these respective distributions by comparing 
the mean and variance of these two terms. 

For a x 2 distribution with k degrees of freedom, the 



(Xk) 

<(xl - (xi)) 2 ) 



2k. 



(15) 
(16) 



The distribution of Ce differs from this distribution by a 
factor of {Ci)/k, with k = 21 + 1. We can thus parameterize 
this distribution with two parameters, a 2 — {Ce) and k — 
2£ + 1. The mean and variance of Ce then becomes: 

{Ci) = o\ (17) 



({C-iOf) = 



2a^ 
k 



(18) 



Similarly, we can define new parameters appropriate for the 
first two terms in the mean and variance of Ce: 

1- 



= 1 + 



2^+1 



k 



1 + 



2^+1 



(Ce), 

(2^+1). 



(19) 
(20) 



The remaining terms in the mean and sigma can be 
modeled as a constant offset (l/e t N^" 1 e) and a zero-mean 
Gaussian with variance equal to 4{Ce)/((2£ + l^N^e). 
Therefore our approximation to the full probability distri- 
bution of C'e is modeled as the sum of two random variables, 
one following a Gaussian distribution and the other follow- 
ing a modified \ 2 distribution. 

2.4 Likelihood evaluation 

In order to estimate the likelihood of a cosmological model 
given the data, we need to write down the full probability 
distribution of the sum of a modified \ 2 distribution and a 
Gaussian. To examine this, we consider the random variable 
x which has a probability distribution P x (x), the random 
variable y which has a probability distribution P y (y), and 
their sum s which has a probability distribution P s (s). With 
these definitions, it is easy to see that the probability of the 
sum s is: 



Ps(s) = 



P x (x)P y (y)S(s -x- y)dxdy. 



(21) 



From this, we can write the unnormalized probability dis- 
tribution of the estimated Ce as: 



P(dt\Ct,TXe) <x / x*- x e 



2 "l dx. 



(22) 



where a k is the variance of the modified x distribution, and 
cjq is the variance of the Gaussian distribution. This integral 
is easily computed numerically. 



3 VALIDATION 

In order to validate this likelihood approximation, we per- 
form a series of simulations. The first set of simulations is 
performed merely to verify that our calculations were ac- 
curate, using 10 5 simulations at a single multipole moment 
with a toy model of the foregrounds. The second simulation 
is a full Planck Sky Model simulation where we could verify 
that this likelihood estimate holds up under a more realistic 
scenario, and further provides some insight into the likely 
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Figure 1. The mean and error bars given by the square root of 
the variance of a set of 10 5 simulations are compared against the 
expected results in Equations pi and j!3| . The differences are 
indistinguishable in the plot, and numerically differ by less than 
0.3%, the expected deviation from 10 simulations. In using test 
data that precisely match the assumptions used in deriving the 
mean and variance, these toy simulations verify that Equations 
pi and p3| were derived correctly. 



relative magnitudes of the various terms in the likelihood 
for real data. 



3.1 Toy simulations 

For toy simulations, we use a minimalistic mathematical 
model purely in order to verify the calculations in this pa- 
per were performed accurately. To this end, we take the 
toy CMB to be a set of unit-variance independent complex 
Gaussian random variables (the ai m values), constrained so 
that at m — ai- m - To ensure numerical accuracy, we ran the 
simulation over one million realizations of these Gaussian 
random variables for a single £. 

For the foregrounds plus noise, since the calculations 
assumed a fixed foreground plus noise signal, we generate 
one set of correlated complex Gaussian random variables 
for each of nine channels. The correlations are produced by 
making use of a randomly-generated covariance matrix for 
each of the nine channels (the square of a random symmetric 
matrix with unit variance elements). This covariance matrix 
is then divided by a factor of 100 in order to make it rela- 
tively small compared to the variance of the toy CMB. The 
mean and variance of these results for three different choices 
of £ are shown in Figure [I] 

Secondly, in order to validate the analytical approxima- 
tion to the full probability distribution, we bin the results of 
the above simulations and compare them against the inte- 
gral of the probability distribution across the bin (Figure [2J) . 
In each case, the analytical approximation to the full prob- 
ability distribution matches the simulations very well. The 
correspondence is not expected to be perfect. However, the 
analytical approximation is clearly superior to the Gaussian 
approximation at low to medium £, with the two becoming 
nearly indistinguishable at I values of around a few hundred. 



3.2 Analysis of realistic simulations 

In order to verify our likelihood function in a more realis- 
tic situation, we use a single realization of the foregrounds 
at all nine Planck frequency channels using the Planck Sky 
Model (described in Section 3.2.11. To the simulated fore- 



grounds, we add a single realization of the noise. This re- 
alization is computed as uncorrelated, non-uniform noise, 
utilizing the same hit maps and per-sample variances as in 



Leach et al. (20081. To the simulated sky and noise we add 



a simulated lensed CMB using the LensPix software pack- 



age (Lewis 2005). A lensed CMB realization is used for the 



reason that the lensing breaks the assumption of complete 
independence of the ap m values, and is known to be a real 
feature in the observed CMB. Taking this non- ideality into 
account is thus important to help ensure that our method 
will be applicable to real data. 



3.2.1 Planck Sky Model 

Sky maps used in the present work are based on simulations 
obtained with the pre-launch Planck Sky Model softwarqM 
a code designed to generate realistic full-sky maps of sky 
emission at millimetre wavelengths. A complete description 



of the PSM code is given in Delabrouille et al. (20121. We 



summarize here the main aspects of the particular simula- 
tion we use. To make a realization of the foregrounds, the 
PSM generates maps of different diffuse and compact source 
components, which are then coadded. For most of them, re- 
alizations are based on observed data sets, complemented 
by theoretical modelling where and when observations are 
missing. The map for each component is then computed 
at a set of observation frequencies on the basis of a mod- 
elled emission law for the component. In addition to CMB 
anisotropics, the components of sky emission implemented 
in the PSM include point sources (galactic and extragalac- 
tic), Sunyaev-Zel'dovich effects (thermal and kinetic), dif- 
fuse galactic emission, and fluctuations of the far infrared 
background emission from high redshift galaxies. 

In the present simulations, diffuse emission from the 
galactic interstellar medium (ISM) is modelled as the sum 
of four main components: thermal emission of small dust 
particles heated by radiation from nearby stars; synchrotron 
emission, due to relativistic electrons spiralling in the galac- 
tic magnetic field; Free-free emission from warm regions of 
ionised interstellar gas; and the emission of small rotating 
dust grains. 

The thermal dust emission is modelled on the basis of 



model 7 of Finkbeiner, Davis fc Schlegel (19991. The total 



emission is represented as the sum of two modified black- 
bodies of the form: 



I v = Y^N i e i v Pl B v (Ti l 



(23) 



where B v (Ti) is the Planck function at temperature Ti, JVj 
is the column density of species i, and e^ i accounts for the 
normalization and frequency dependence of the emissivity. 
Low frequency diffuse foregrounds (synchrotron, free- 
free ad spinning dust) are based on the analysis of WMAP 



http : //www . ape . univ-par is7 . f r/~delabrou/PSM/psm . html 
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Toy model, £ = 10 



£ 0.030 - 
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1 1. 1 III.-, 
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Toy model, £ = 100 



a 




Toy model, £ = 500 




Figure 2. This test of the approximations to the probability distribution of Cg shows that the analytical approximation, which models 
the probability as a sum of a chi-square distributed random variable and a Gaussian-distributed random variable, as well as a Gaussian 
approximation for comparison. While not perfect, the analytical approximation is better than the Gaussian at replicating the simulations. 
At high (., the Gaussian approximation and the analytical approximation become nearly indistinguishable in these plots. 



observations performed by Miville-Deschenes et al. (20081 



The synchrotron emission is obtained by scaling with a pixel- 
dependent emission law the template emission map observed 



at 408 MHz by Haslam et al. (19821. In the radio-mm fre- 



quency range, for an electron density following a power law 
of index p, (ne(E) oc Ep), the synchrotron frequency depen- 
dence is also modelled as a power law: 



Isy nc {v) OC z/ s+2 



(24) 



where the spectral index is equal to /3 S = — (p + 3)/2. The 
synchrotron spectral index depends on cosmic ray prop- 
erties. It varies with the direction on the sky, and possi- 
bly, with the frequency of observation (see, e.g., |Strong,| 
Moskalenko & Ptuskin ( 2007 1 , for a review of propagation 



and interaction processes of cosmic rays in the Galaxy). The 
template synchrotron map obtained from Haslam et al., cor- 
rected for an offset monopole of 8.33-fiT (Rayleigh- Jeans), is 
extrapolated in frequency on the basis of the spectral index 
map obtained from WMAP data. 

Free-free is modelled on the basis of a template free-free 
map at 23 GHz, and a single universal emission law as given 



by Dickinson, Davies & Davis ( 2003 1 



In the PSM, the spinning dust emission is represented 
on the basis of a single template, together with a unique 
emission law which can be parametrised following the model 



of Draine & Lazarian (19981. The spinnning dust template 



is obtained by removing every other components contribut- 
ing to the WMAP 23GHz data (free-free, synchrotron and 
thermal dust). 

As the resolution of our simulation (5 arcmin) is bet- 
ter than that of the templates used to construct the model, 
small-scale fluctuations are added to Galactic emission sim- 
ulations. The method used is described in Miville-Deschenes 
et al. (20071. A Gaussian random field G ss having a power 



spectrum defined as: 



d=V 



(25) 



is generated. Here <r s i m and o"tcm are the resolutions (in ra- 
dian) of the simulation (5' here) and of the template to which 
small-scale fluctuations are added. The zero mean Gaussian 
field Gss is then normalized and multiplied by the template 
map exponentiated to a power )3 in order to generate the 
proper amount of small-scale fluctuations as a function of 



local intensity. The resulting template map with small scale 
fluctuation added is then: 



iL 



Item + ttGssI tc 



(26) 



where a and j3 are estimated for each template in order 
to make sure that the power spectrum of the small-scale 
structure added is in the continuity of the large scale part 
of the template. 

The Sunyaev-Zel'dovich effect is modelled as the sum of 
two effects: thermal SZ effect which corresponds to the inter- 
action of the CMB with a hot, thermalised electron gas, and 
the kinetic SZ effect which corresponds to CMB interaction 
with electrons having a net ensemble peculiar velocity along 
the line of sight. For thermal SZ, in the approximation of 
non-relativistic electrons, the change in sky brightness is: 



5I V = yf(u)B u (T C MB) 



(27) 



where B(Tcmb) is the CMB blackbody spectrum, f(u) is a 
universal function of frequency that does not depend on the 
physical parameters of the electron population and y, the 
Compton parameter, is proportional to the integral along 
the line of sight of the electronic density ne multiplied by 
the temperature of the electron gas: 



5y = 



kT e 



n e aTdl 



(28) 



where T e is the temperature of the electron gas, character- 
ized by a Fermi distribution. The kinetic effect, due to the 
interaction of CMB photons with moving electrons, results 
in a shift of the photon distribution seen by an observer on 
Earth: 



5I V = -j3 rT 



dB v {T) 



OT 



(29) 



T- T CMB 

where j3 r is the dimensionless cluster velocity along the line 
of sight. The SZ simulation used in the present work is based 
on two different parts: hydro+N-body simulations of the dis- 
tribution of baryons for redshifts z < 0.025 obtained from 
Dolag et al. (20051, and pure N-body simulations of dark 



matter structures in a Hubble volume, using template from 
a smaller simulation by Schafer et al. (2006). 

Other compact sources are implemented as the sum of 
three populations: radio sources, infrared sources, and galac- 
tic ultra-compact HII regions. 

Radio sources are simulated on the basis of both real 
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radiosources observed around 1 and/or 5 GHz and on simu- 
lated sources generated at random to homogeneize the cover- 
age where the sky surveys are shallow or incomplete. Sources 
fluxes are extrapolated to all frequencies required by the 
simulation using power law approximations of the spectra 
of the sources, of the form S v oc v~ a with different spec- 
tral indices in different frequency domains. Radio sources 
are divided in two populations, steep and flat, with typi- 
cal spectral indexes distributed according to a Gaussian law 
with (a s tecp} = 1.18 and (afl at ) = 0.16 respectively, and a 



variance of a 



steep 



flat = 0.3. 



IR sources are simulated on the basis of a compilation of 
sources observed by IRAS. Fluxes are extrapolated to use- 
ful frequencies using modified blackbody spectra v h B(v,T), 
B(u,T) being the blackbody function. Again, randomly dis- 
tributed sources are added until the mean surface density 
as a function of flux matches everywhere the mean of well 
covered regions. 

The 864 sources identified as ultra-compact Hn regions 
are treated separately from the above two populations. Their 
measured fluxes in the IRAS catalogues at 100 and 60 /im, 
together with counterparts in the radio domain, are fitted 
with the sum of a modified blackbody and a free-free spec- 
trum. 

Finally, the PSM simulations include a background of 
unresolved high redshift faint sources (dusty and star form- 
ing galaxies), which is used to model anisotropies of the 
cosmic infrared background. 

When all these components have been computed for all 
nine Planck frequencies, all observed component maps are 
then coadded to have a full-sky foreground map which is 
then added to our CMB and instrument noise simulations. 



3.2.2 Masking 

One of the difficulties with analyzing experimental data is 
that it is not possible to use the entire sky for the analysis, 
because experimental data may not cover the full sky in 
every channel, and because the brightest parts of the sky 
need to be masked to produce reliable results. Our method, 
if it is to be applicable to experimental data, needs to take 
this into account. This lack of full sky coverage breaks the 
assumption of independence of the ai m values. Therefore, we 
take the strategy of cutting as little of the sky as possible 
and hope that the assumption of independence is broken 
little enough that the analysis remains robust. 

To this end, the masking strategy we use is to take 1.2% 
of the brightest pixels in each frequency map to make a mask 
at each frequency, then combine these masks to ensure any 
pixel masked in any frequency is masked. Point sources are 
not treated separately, though many of the brighter sources 
are masked through this method. In total, this cuts about 
1.6% of the sky. Then, to prevent aliasing, we apodize the 
mask with a 33' width apodization filter (33' is the resolution 
of the 30 GHz channel). The apodization filter we use sets 
the value of any unmasked pixel equal to e" ' where 
S6 is the angular distance to the nearest masked pixel. This 
apodization filter ensures that all masked pixels remain com- 
pletely masked, and both masked point sources and larger 
masked regions have similar apodization applied. For com- 
putational speed, any unmasked pixel that would have a 



value greater than 0.99 in the filter is left untouched. After 
applying the apodization, about 3.3% of the sky is removed 
in total. The sky fraction, / s k y = 0.967, is estimated using 
the average of the squares of the mask pixels, because the 
ae m values are squared when summing the power spectrum. 

The particular masking level is chosen to ensure that 
the mask is as small as possible, to prevent correlations, 
while still not producing obviously wrong values for the low- 
£ power spectrum, as smaller masks tend to produce aliasing 
of the foreground signal into the low multipoles, resulting in 
one or more \ow-£ values being a couple of orders of magni- 
tude larger than expected. Additionally, even slightly larger 
initial masks result in removing many more point sources 
which, after apodization, results in much larger sky fraction 
being removed (e.g. 7.1% removed in total with an initial 
mask size set to 1.5%). 

Even when retaining a large fraction of sky, we risk 
biasing the estimated power spectrum by a few percent if 
we do not take the fractional coverage of sky into account. 
In order to make progress here, we write down the effect of 
the mask in harmonic space as follows: 



{cf 



*) = 



l 



i 

21 +1 ^ 



(ae m Wlm) {aim Wlm)* ■ (30) 



Here the operator represents a convolution, which can 
be performed by converting the spherical harmonic trans- 
form of the mask we m and the spherical harmonic transform 
of the CMB ai m back to pixel space and then multiplying 
the two together pixel by pixel. In practice, this process will 
mix the signals betwen different m and I values together. 
However, if we ignore the correlations induced by this pro- 
cess and assume statistical isotropy of the CMB, then we 
can represent the effect of the mask as a simple rescaling of 
the power spectrum: 



{cf 



*) = 



E 



21 +\ L^, 

m=-l 



dirndlr, 



(31) 



Here w% represents the value of the mask at pixel i, with 
Wi — 1 for pixels that are fully unmasked, Wi = for pixels 
fully masked, and values in between for pixels within the 
apodized region at the borders of the mask. We can thus 
represent this sum over the pixels in the mask as a single 
parameter: 



N p -1 

Uy = ^ E W - 



(32) 



With this definition, we can write our final estimate of 
the probability distribution of Ci as follows: 

1 — n c \ , „ , . 1 



{C e )f sky = 1 + 



IUAI 



{Ci) + 



\SC( )/ s ky — 



2{C e ) 



1 + 



1 — n c 



etN^e 



{d) + 



etNl 



(33) 



(34) 



where n c ff = (21 + l)/ s k y . The motivation for this change 
is that the primary effect of the mask is to apply a weight 
function to the at m values so that instead of sums over 21 + 
1 independent random variables, we have sums over (21 + 
l)/sk y independent random variables. 

While the induced correlations between the Ci values 
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are not taken into account, it is our hope that keeping the 
masked region as small as possible minimizes the impact 
of this choice. In the following subsection we go over the 
tests we have used to ensure that at this level, the likelihood 
function is good enough to accurately capture the power 
spectrum and its uncertainties. 

3.2.3 The Kolmogorov-Smirnov Test 

In order to determine whether or not our approximation 
to the likelihood function is accurate, we make use of a 
Kolmogorov-Smirnov test, using the different £ values as dif- 
ferent samples in the test. The K-S test compares the empir- 
ical cumulative probability distribution of a sample distri- 
bution with the theoretical cumulative probability distribu- 
tion, and provides a statistic which can be used to evaluate 
whether or not the two distributions are the same. 

The Kolmogorov-Smirnov statistic is the area that lies 
between the cumulative probability distributions of a test 
distribution and an empirical distribution. If the empirical 
distribution and the test distribution have the same shape, 
then this K-S statistic converges to zero. In fact, the conver- 
gence to zero occurs at a very specific rate depending upon 
the number of samples, if the two probability distributions 
are the same. 

Correlations that are not accounted for in the test distri- 
bution will tend to reduce the rate at which the K-S statistic 
will converge to zero. To take a simple example, consider a 
situation where the test distribution assumes the samples 
are perfectly uncorrelated, while the input data sets every 
other value equal to the previous. With each value in the in- 
put data appearing twice, the K-S statistic will converge half 
as quickly, leading to a larger than expected K-S statistic. 

Alternatively, if the input data have unmodeled anti- 
correlations, the reverse occurs and the K-S statistic con- 
verges more rapidly to zero than expected. A toy example 
here is to consider the situation where the test and empirical 
distributions both produce numbers between zero and one, 
but the empirical distribution sets every other number equal 
to one minus the previous. In this case, the K-S statistic will 
have encountered all of the independent numbers while it- 
erating over only half of the range, which in practice speeds 
its convergence. 

Thus by combining a x 2 test with the K-S statistic, we 
can obtain an estimate of both the shape of the distribution, 
which the \ 2 test is sensitive to, and the correlations. 

However, we cannot apply the K-S statistic directly 
with only one realization because the probability distribu- 
tion of Ce depends upon £. We can, however, perform a 
transformation on the random variable Ce into some target 
probability distribution which is easily-estimated. A unit- 
variance Gaussian distribution is ideal here. Such transfor- 
mation between different probability distributions is, for in- 
stance, routine in the generation of pseudorandom numbers. 
This is done by equating the cumulative probability distri- 
butions: 



1 

2^ 



e 2 dx = 



P(C't)dC e . 



(35) 



Note that for pseudorandom number generation, the source 
distribution is usually the uniform distribution on the inter- 
val [0, 1), for which the cumulative distribution is simply the 



random number itself. This sometimes masks the fact that 
what is being done is the equating of two cumulative distri- 
butions, making it appear as if one cumulative distribution 
is being equated to a random number. Our situation is not 
quite that simple. 

Since the left hand side is the cumulative distribution 
of a unit-variance Gaussian distribution in one variable, we 
know the x\ wm follow a x 2 distribution with one degree 
of freedom. We convert to a Gaussian distribution first and 
then to a \ 2 distribution because it is much simpler to in- 
vert the cumulative Gaussian distribution. The astute reader 
may note that the left hand side of the equation is simply 
related to the error function, and that we can compute Xe 
using the inverse of the error function as follows: 



Xl = 2 




P(C'e)dC' e - 1 



(36) 



If our approximation to the full probability distribution 
of Ce is sufficiently accurate, then the Xe f° r each £ follows 
a x 2 distribution with one degree of freedom. That is, 



P(xt) 



v^od 



(37) 



For the inverse of the error function, we make use of an ap- 
proximation with elementary functions pr oduced by Serge i 
Winitzkrl using methodology described in Winitzki (20031, 



erf (x) 



/(*) 



S gn(x)\ J (f(x)Y 



2 ln(l - x 1 
ira 2 



ln(l - x 2 



fix), (38) 



(39) 



using the value of o = 0.147 to keep the relative error at 
around the 10 -4 level for all values of the argument of the 
inverse error function. 

Note that while there are more computationally effi- 
cient methods to convert from some given probability dis- 
tribution to a unit- variance Gaussian, they typically involve 
tricks such as combining two random numbers in non-trivial 
ways, or throwing away some random numbers. Neither of 
which would be an ideal solution here, as we only have a lim- 
ited number of Ce numbers to transform, and mixing mul- 
tiple Ce values would make the outcome for correlated or 
anti-correlated Ce values difficult to predict. By using this 
direct transformation, the transformation reduces to what 
accounts to mostly a rescaling of the distribution of Ce (as 
X is monotonically increasing along with Ce). By just rescal- 
ing the distribution, we are likely to retain any correlations 
that exist, especially at higher £ where the rescaling is nearly 
uniform as P{Ce) approaches a Gaussian distribution. 

Finally, in order to determine whether or not our com- 
puted value of the K-S statistic is reasonable, we compare 
this K-S statistic to a set of 10 4 realizations of 2999 x 2 ~ 
distributed independent random variables, quantifying the 
result using the fraction which provide a K-S statistic greater 
than the K-S statistic of our simulation result. 

The results of these tests are that at the level of a single 



https : //sites . google . com/site/winitzki/sergei-winitzkis-f iles/ 



erf- approx . pdf 
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Figure 3. These figures compare the power spectra of the simulation input and of the component-separated results. Note that the 
debiasing which was performed assumed perfect knowledge of the foreground plus noise signal as well as input CMB, which significantly 
reduces the scatter, particularly at high £, compared to what would be available in an actual experiment. However, these plots are useful 
to visually demonstrate the validity of Equation (I33B when applied to a realistic simulation. When used in an actual experiment, this 
equation would require factoring in the uncertainties in the foreground plus noise signal in some other manner. The primary conclusion 
here is that the mask, which removes 3.7% of the sky, does not impact the results in a manner which is clearly visible at the power 
spectrum level. 





X 2 test 






x 2 


expected x 2 


Analytical 
Gaussian 


2952 
2956 


2999 ± 77.4 
2999 ± 77.4 


K-S test 


Analytical 
Gaussian 


K-S statistic (D a ) 
0.0138 
0.0133 


P(D > Da) 
0.60 
0.65 



Table 1. The results of these statistical tests show that the prob- 
ability distributions of the resultant Ci values are in good accor- 
dance with the expected distributions to within the sample vari- 
ance given by the £ range used. The K-S statistic found is within 
the one-sigma significance for both the Analytical and Gaussian 
estimates, indicating no significant unaccounted for net correla- 
tions or anti-correlations in the data. The expected error on the 
X 2 test comes from the variance of the x 2 distribution, which is 
twice the number of degrees of freedom. The probability P(D > 
D a ) is the probability of obtaining a higher K-S statistic, as esti- 
mated by 10 4 simulations of sets of 2999 x 2_ distributed random 
variables. Values close to 0.5 are expected. 



realization of the sky and over an £ range from 2 — 3000, 
both the K-S statistic and the % 2 statistic lie within the 
68% confidence limits. These results are shown in table [I] 



4 DISCUSSION 

In this paper, we have presented a new likelihood function 
which can be used to estimate the likelihood of the CMB 
power spectrum given the data and a foreground model. 
Of particular interest is that with the simulations used in 
this paper, Planck data shows very little dependence on the 
foreground model until £ > 2000 or so, as shown in Figure H] 
If true, this would be an indication that Planck has adequate 




Figure 4. Comparison of the foreground and noise bias compo- 
nents against the debiased CMB power spectrum and an estimate 
of the RMS error. This estimate of the RMS error is based upon 
Equation | |14[ l , which adds a very rough estimate of the error due 
to the uncertainties in the foregrounds plus noise, assuming the 
foregrounds plus noise have a known power spectrum and are 
Gaussian-distributed. Because the foregrounds are negligible at 
low £ compared to cosmic variance, an accurate diffuse foreground 
model is probably not necessary. With the foregrounds becoming 
significant compared to the overall uncertainty at around £ > 900, 
an accurate compact source model is required, as the level of the 
foreground residuals becomes larger than the total expected error 
in the power spectrum. This supports the use of PSM-like simula- 
tions for diffuse foregrounds, with the uncertainties ignored, com- 
bined with a physical model for the compact foregrounds whose 
parameters are estimated along with the cosmological parameters. 



frequency coverage for the level of diffuse foregrounds in 
temperature data. 

For £ > 1000 or so, however, it is notable that in order 
to match the simulations, we need not only a model of in- 
strument noise, but also the foreground model. This is an 
indication that compact sources remain a significant factor 
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in the data. From these tests, it would be perfectly reason- 
able to make use of a fixed Galactic foreground model com- 
bined with a power spectrum-based compact source model, 



such as that used in Millea et al. (2011 1. This is very much 



an ideal situation because the parameters that impact the 
power spectra of compact sources can be very well-estimated 
at the power spectrum level. 

The most accurate way of including the instrument 
noise contribution would probably be to pre-compute the 
spherical harmonic transforms of a set of simulations of the 
full instrument noise. While the number of noise simulations 
is likely to be much smaller than the number of Monte Carlo 
simulations in the full parameter estimation analysis, a thou- 
sand or so noise simulations should be sufficient to effectively 
marginalize over instrument noise. This method would have 
the advantage of incorporating the full instrument noise co- 
variance, if sufficiently-detailed noise simulations are used, 
albeit in an approximate manner. Care would need to be 
taken to ensure that the use of the simulations within the 
parameter-estimation MCMC chain do not require signifi- 
cant computing resources. One might also imagine simpler 



approximations such as that used in Equation ( 14 1 



Therefore we would recommend making use of the prob- 
ability function P(Ce\Ce,Ne), where Nt = l/e i 'NJ 1 e, with 
Nf being the estimated covariance of the instrument noise 
plus foreground model which depends upon any foreground 
parameters which are to be marginalized over. 

Once the details of the foreground plus noise models are 
determined, the work in this paper demonstrates that pro- 
vided the mask removes very little of the sky, the likelihood 
function we supply provides a reasonably accurate estimate 
of the true likelihood of the CMB in the presence of fore- 
grounds. However, it does have difficulties at low multipoles, 
as can be seen through a cursory examination of Equation 
(TTp . In particular the second term, (1 — n c )/(2£ + l){Ce), 
represents the statistical component of the ILC bias. Note 
that the denominator is the number of ae m terms that go 
into the average for this power spectrum component. 

Perhaps a useful comparison of how significant this bias 
term is would be to compare it against the fact that other 
power spectrum estimation codes use a much smaller frac- 
tion of the sky in their analysis with a similar impact on the 
error that results. If we are to compare against a method 
which uses 80% sky coverage, for instance, a back-of-the- 
envelope calculation would suggest that we want our sta- 
tistical bias term to be no more than about 15% or so of 
the power spectrum, indicating we should have a better es- 
timate of the errors for roughly £ > 50 as compared with 
such a hypothetical method. While some may be nervous 
about trusting our estimate of the errors in the intermedi- 
ate £ range of around 50 < I < 400 or so, our simulations 
indicate that the residual foreground contamination in this 
range is more than an order of magnitude below the esti- 
mated total error in the power spectrum. 

Additionally, it might be worthwhile to consider binning 
the power spectrum in order to reduce the magnitude of the 
statistical bias term. However, binning the power spectrum 
does destroy some information about the CMB power, and 
it may potentially increase the foreground bias term, so we 
do not consider that here. One additional possibility is a 
hybrid method: estimate the ILC weights on bins in I, but 
extract the power spectrum separately at each I. This has 



the benefit of reducing the statistical bias term while retain- 
ing full resolution on the CMB power. However, it has the 
drawback that it likely increases the foreground bias term, 
while at the same time inducing correlations between each 
Ci within a bin. 

While it is in principle possible to estimate the correla- 
tion within each bin, at very low-£ the likelihood function is 
non-Gaussian, and it is an extremely non-trivial problem to 
consider both the correlations and the non-Gaussianity of 
the low-<? power spectrum at the same time. There is, how- 
ever, an intermediate range in £, around £ = 100 to £ = 400 
or so, where the statistical bias term is non-negligible but 
the likelihood function highly Gaussian where this approach 
may be tractable. But due to the work involved for what is 
likely to be a very small improvement in the CMB uncer- 
tainty (of the order of a few percent in the variance in Ci), 
we do not consider this approach here. 

Another potential question is why we do not suggest 
simply making use of a model of the foregrounds given by 
the data minus the estimated CMB. As we show in Ap- 
pendix IB] using this as the foreground model gives identi- 
cally zero for the estimate of the foreground contamination. 
In fact, this calculation places into doubt any sort of "boot 
strapping" -type estimation of the foreground subtraction er- 
rors: it is absolutely required to have some physical model 
of the foregrounds in order to estimate the foreground sub- 
traction errors. 

This paper also begs the question that if we require a 
foreground model such as that used in Millea et al. (20111, 



why go through the work of implementing the ILC at all? 
Why not simply make use of the parametric model of the 
foregrounds? Our argument here is that because the ILC 
method significantly reduces the foregrounds prior to the 
likelihood analysis, this method will be less sensitive to the 
precise details of the foreground model used, permitting a 
greater level of uncertainty in the foreground model before 
cosmological parameters are impacted. This is particularly 
the case at high I when we are no longer limited by cosmic 
variance. 
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APPENDIX A: VARIANCE CALCULATION 



In order to compute the variance of the estimated power spectrum, we start with Equation ( 13 1, the square of which is 



8Cf 



i — 

e t N7 1 f, + fjN7 1 e\ 2 + /VN^fif^N^V ~ 
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This equation seems quite daunting, however when we take the expectation value, many of the terms turn out to be zero. The 
first aspect which we exploit is that only even products of the aim coefficients have non-zero expectation value. Note that fe is 
linear in aim, while Ci goes as the square of ag m . Additionally, because N^ is a real, symmetric matrix, e'NT 1 !^ = fjNT e. 
Combining these two statements along with (SCi) = gives: 
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From here, we need to individually compute the expectation values of each component. This is done using the known 
expectation values of the aim coefficients: 

( a em i a.£m 2 } — (Ci)Sm im2 , (A3) 
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Here we describe in detail the calculation of one of the more complicated terms, and simply list the answers for the remaining 
terms. Perhaps the most complicated term is the final one, which requires taking the expectation value of the following: 
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We expand the square in this particular way in order to ensure that when expanded fully, the at m coefficients would have the 



same order of complex conjugation as in Equation ( A4|. The full expansion is done by using the definition of fi in Equation 



(|8j). Note that the conjugate transpose of f^ is then defined as follows: 
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ft = J2 «*m*L- ( A6 ) 



With these definitions, Equation ( A5 I becomes: 
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(A7) 



We can then make use of the definition of N/, which is: 
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(A8) 



as well as Equation (A4l and the fact that f/ m = f^_ m to give: 
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Since the trace of the identity matrix is simply the number of channels, this expectation value simplifies to: 
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The expectation values of the remaining terms in Equation ( A2 l are: 
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Substituting these expectation values into Equation |A2l gives: 
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APPENDIX B: ON THE NEED FOR A PHYSICAL MODEL 

There are significant difficulties in developing an accurate physical model for the foreground signal. For diffuse Galactic 
foregrounds, variation along the line of sight of the foreground properties (such as the dust temperature) have the potential 
to significantly complicate the modeling of the foreground. For compact sources, the spectra of these sources may vary 
significantly depending upon the source, and may show some variation over the age of the universe as well, which would make 
the process of validating compact source spectra by using the nearby bright sources a questionable proposition. Therefore, an 
absolutely ideal situation would be one in which we are able to ignore the potential difficulties of finding an accurate model for 
foreground emission and are able to extract the primary CMB signal without worrying about the details of the foregrounds. 
This may be one of the motivating factors for the use of so many blind component separation methods, such as the many 
variations of ILC and ICA that have been used. In fact, it is often impressive how well these component separation methods 
do at removing the foregrounds. However, as we argue is the case here for ILC in harmonic space, there is no way to estimate 
the actual errors in the removal without a physical model. To do this, we examine the case where instead of having a physical 
model for the covariance of the foregrounds N^ , we make use of an estimate of the foreground signal given by: 
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Here £g m is the spherical harmonic transform of our estimate of the foreground signal at each observation frequency, d^ m is 
the spherical harmonic transform of the data at each frequency, and ae m is the spherical harmonic transform of our estimate 
of the CMB sky through harmonic-space ILC. 

With these definitions, it is easy to show that the covariance matrix at each i of our estimate of the foreground signal is 
given by: 

N, = C t -ee^Ct. (B2) 

To understand how the use of this estimate of the foreground signal affects our results, we need to use this estimate for 
the term that encodes the effect of the foreground power spectrum in our bias and variance estimates, l/e*NJ e. To facilitate 
the calculation of this statement, we write N< by expanding Ce in terms of its CMB and foreground components as follows: 

Ni = ee t (C/-&)+efj'+fce + +Ni. (B3) 
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Because of the similarity of Equation (B3| with 19]), we can simply write down the result as a slight modification of Equation 
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= C e - G t = 0. (B5) 

Thus, by using this simplified estimate of the foreground signal, all information about the foreground contamination, both 
in the bias and variance, is destroyed. While we could, in principle, add something to Ci to make it so that l/e'N + e 7^ 0, 
the difference between l/e t N + e and zero will just be a function of the arbitrary offset and not provide us with any new 
information about the foreground signal. 

Furthermore, though our calculation is performed on a term which was derived assuming uncorrelated CMB samples, 
|Delabrouille et ah] ( |2009[ ) argue that this sort of calculation can be used as an approximation to the true bias even with 
correlated samples, at least within the Needlet framework. Therefore, to first order, the effect of the foreground contamination 
is canceled in the case of correlated CMB samples. Higher-order terms arising from the correlations may cause some information 
about the bias to be recovered, but with most of the information destroyed by this method, it is unlikely to be useful. This 
has direct implications upon the bias calculated by the WMAP team in|Hinshaw et al" (|2007[>, as their estimation of the bias 



through simulations directly mirrors the methodology of the calculation performed in this Appendix 3 ! This indicates that the 
bias used in computing the ILC map published by the WMAP team likely misses most of the actual ILC bias, and may have 
higher-order contributions which are not understood and may, in certain cases, even have the wrong sign. Similarly, |Kim,| 



Naselsky & Christensen ( 2008 1 also makes use of an estimate of the foregrounds as being the data minus their ILC-estimated 
CMB, and therefore likely underestimates the bias as well. 

This calculation seems to argue that in order to obtain an accurate estimate of the errors and biases in the ILC method, 
it is fundamentally required to have a physical, parametric model of the foregrounds. It is, in principle, possible to make 
use of simulated foregrounds. However as our likelihood function in Equation ( |22[ ) demonstrates, the probability distribution 
of Ct depends both upon the cosmological model and upon the foreground model, and it is very difficult to vary both in a 
simulation environment. There is the additional concern that some of the foregrounds, such as SZ clusters, would also depend 
upon the cosmological model in a detailed treatment. 



3 Though their paper mentions the MEM model was used for the foreground estimate, correspondence with the WMAP team combined 
with our own empirical studies confirm that what was actually done was to simply subtract the ILC estimate from the sky maps to 
produce their estimate of the foregrounds. 



